Trade&Ahead

Context

The stock market has consistently proven to be a good place to invest in and save for the future. There are a lot of compelling reasons to invest in stocks. It can help in fighting inflation, create wealth, and also provides some tax benefits. Good steady returns on investments over a long period of time can also grow a lot more than seems possible. Also, thanks to the power of compound interest, the earlier one starts investing, the larger the corpus one can have for retirement. Overall, investing in stocks can help meet life's financial aspirations.

It is important to maintain a diversified portfolio when investing in stocks in order to maximize earnings under any market condition. Having a diversified portfolio tends to yield higher returns and face lower risk by tempering potential losses when the market is down. It is often easy to get lost in a sea of financial metrics to analyze while determining the worth of a stock, and doing the same for a multitude of stocks to identify the right picks for an individual can be a tedious task. By doing a cluster analysis, one can identify stocks that exhibit similar characteristics and ones that exhibit minimum correlation. This will help investors better analyze stocks across different market segments and help protect against risks that could make the portfolio vulnerable to losses.

Objective

Trade&Ahead is a financial consultancy firm who provide their customers with personalized investment strategies. They have hired you as a Data Scientist and provided you with data comprising stock price and some financial indicators for a few companies listed under the New York Stock Exchange. They have assigned you the tasks of analyzing the data, grouping the stocks based on the attributes provided, and sharing insights about the characteristics of each group.

Data Description

The data provided is of stock prices and some financial indicators like ROE, earnings per share, P/E ratio, etc.

Data Dictionary

Importing necessary libraries and data

Loading Data

Data Overview

Data Preprocessing

Observations

Observations

Exploratory Data Analysis (EDA)

Univariate Analysis

Observations

Bivariate Analysis

Check for correlations

Observations

Pairplot

Data Preprocessing continued

Note:

According to limited, rudimentary understanding of stocks, it seems there may be some anomalies with the data., e.g., is it possible P/E Ratio and P/B Ratio may be switched (labeling)? Some of the values seem as though they may not be accurate, but without all the data, it is hard to make the determination or create a solution. We will proceed as though all the data is accurate, as there are no duplicate or missing entries.

Scale Data

We need to scale the numerical data prior to clustering

K-means Clustering

Selecting $k$ with the Elbow Method

The appropriate $k$ appears to be 4, 8, or 10. Let us check silhouette score

Selecting $k$ from Silhouette Score

From silhouette score, it appears that 3 or 4 may be good values for $k$

Finding optimal no. of clusters, $k$, with silhouette coefficients

Let us take $k=4$

Unfortunately all visualization for $k$-values from 3 to 10 cross the threshold to negative on the silhouette coefficient values, but $k=4$ has a knick in the elbow curve and has a high silhouette score.

Insights

Hierarchical Clustering

Explore different linkage methods with Euclidean distance only

We see that the cophenetic correlation is highest when obtained with Euclidean distance and average linkage.

Dendograms for different linkage methods

Observations

Cluster Profiling

Observations

Cluster Profiling Part 2

Now the clusters appear to have more variability.

Observations

Insights

K-means vs Hierarchical Clustering

K-means

Hierarchical Clustering

Actionable Insights and Recommendations